Skip to content

FEAT Add HarmfulQA dataset loader#1421

Open
romanlutz wants to merge 9 commits intoAzure:mainfrom
romanlutz:romanlutz/add-harmful-qa-dataset
Open

FEAT Add HarmfulQA dataset loader#1421
romanlutz wants to merge 9 commits intoAzure:mainfrom
romanlutz:romanlutz/add-harmful-qa-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering.

Copilot AI review requested due to automatic review settings March 1, 2026 14:14
@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from f8de803 to e996238 Compare March 1, 2026 14:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed-dataset provider for the HuggingFace declare-lab/HarmfulQA dataset so it can be fetched and used as SeedPrompt entries within PyRIT’s dataset discovery/registration system.

Changes:

  • Introduced _HarmfulQADataset remote loader that fetches HarmfulQA from HuggingFace and converts rows into SeedPrompts.
  • Exported the new loader from pyrit.datasets.seed_datasets.remote to trigger auto-registration.
  • Added unit tests validating basic fetch + conversion behavior and dataset_name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py New remote dataset loader implementation for HarmfulQA -> SeedDataset/SeedPrompt conversion.
pyrit/datasets/seed_datasets/remote/init.py Re-export/import the new loader so it’s discoverable/registered alongside other remote loaders.
tests/unit/datasets/test_harmful_qa_dataset.py Unit tests for fetching/conversion and dataset_name behavior.

@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from e996238 to d441180 Compare March 1, 2026 14:26
Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k
harmful questions organized by academic topic and subtopic for testing LLM
susceptibility to harm-inducing question-answering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:00
@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from d441180 to b4c033f Compare March 2, 2026 13:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

romanlutz and others added 2 commits March 2, 2026 05:36
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

romanlutz and others added 2 commits March 2, 2026 05:53
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 14:02
romanlutz and others added 2 commits March 2, 2026 06:05
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants